-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: fix node tests fail intermittently #4044
Merged
bruce-riley
merged 1 commit into
wormhole-foundation:main
from
wormholelabs-xyz:tilt_node_tests_fail_intermittently
Jul 31, 2024
Merged
ci: fix node tests fail intermittently #4044
bruce-riley
merged 1 commit into
wormhole-foundation:main
from
wormholelabs-xyz:tilt_node_tests_fail_intermittently
Jul 31, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bruce-riley
changed the title
wormchain: change ioutil to io and os (#3970)
Tilt: Node tests fail intermittently
Jul 30, 2024
bruce-riley
force-pushed
the
tilt_node_tests_fail_intermittently
branch
4 times, most recently
from
July 30, 2024 21:44
3a76f0f
to
57ccd9f
Compare
SEJeff
previously approved these changes
Jul 30, 2024
bingyuyap
previously approved these changes
Jul 31, 2024
evan-gray
changed the title
Tilt: Node tests fail intermittently
ci: node tests fail intermittently
Jul 31, 2024
evan-gray
force-pushed
the
tilt_node_tests_fail_intermittently
branch
from
July 31, 2024 13:07
57ccd9f
to
0c4b3d0
Compare
evan-gray
force-pushed
the
tilt_node_tests_fail_intermittently
branch
from
July 31, 2024 13:11
0c4b3d0
to
069e1f0
Compare
evan-gray
force-pushed
the
tilt_node_tests_fail_intermittently
branch
from
July 31, 2024 13:11
069e1f0
to
3c2e0e7
Compare
evan-gray
changed the title
ci: node tests fail intermittently
ci: fix node tests fail intermittently
Jul 31, 2024
evan-gray
approved these changes
Jul 31, 2024
bingyuyap
approved these changes
Jul 31, 2024
SEJeff
approved these changes
Jul 31, 2024
jtieri
pushed a commit
to strangelove-ventures/wormhole
that referenced
this pull request
Aug 23, 2024
* Node metric and performance tweaks (wormhole-foundation#3967) * Node: Metric and performance tweaks * Tweak observation metric * pubsub StrictNoSign * bigger observation metric buckets * biggerer observation metric buckets * Node: Add metric for libp2p drops * Disable StrictNoSign for now --------- Co-authored-by: Evan Gray <battledingo@gmail.com> * node: Allow tokens to change their symbol in generator script * Node/CCQ: Allow anything in testnet (wormhole-foundation#3966) * node: update xlabs testnet guardian endpoints * clients/js: pin tsx version * ci: evm relayer address fix * node: Governor token list update * Node: Minor tweaks and spy improvement (wormhole-foundation#3974) * Node: Minor tweaks and spy improvement * Add tests * node: Token generator script flags tokens with changed symbols * Deploy Gnosis Chiado Testnet (wormhole-foundation#3987) * node: Update flow cancel tokens (wormhole-foundation#3986) - Add canonical Solana and Ethereum stablecoins - Remove Aave USDC * node: governor custom reset release timer delay (wormhole-foundation#3983) * Update ChainGovernorResetReleaseTimerRequest protobuf message * Add numDays argument to governor-reset-release-timer command * Update governor backend to support numDays argument * Address review comments * Add test for resetReleaseTimerForTime()'s numDays parameter * Address review comments * Add adminrpc test for ChainGovernorResetReleaseTimer * Replace hardcoded upper boundaries with maxResetReleaseTimerDays * Update governor whitepaper to reflect the new argument * Added default value to governor whitepaper --------- Co-authored-by: Jason Matthyser <jason@asymmetric.re> * node: Fix bug in flow cancel mechanism where the wrong values were being used for tokenEntry (wormhole-foundation#3990) * node: add more unit tests for flow cancel * node: Fix tokenEntry indexing issue in Governor flow cancel logic - Indexes the tokenEntry for flow cancel tokens properly (using the token's origin chain and origin address - Add many more tests to check flow cancel logic in more detail and at the resolution of the ProcessMsgForTime method. Big thanks to Max for helping to debug the issue and the unit tests. * node: add check to ensure the governor usage is not zero * node: Change flow cancel test so that origin and emitter chain are different * node: Add unit test for partial flow cancel Add additional test for flow cancel mechanism where the numbers do not cleanly cancel out. * node: fix lint issues in governor test * node: Add emitters for Lido wstETH NTT deployment to NTT Accountant mainnet tracking (wormhole-foundation#3997) * docs: update governor docs to reflect reset-release-timer change (wormhole-foundation#3999) * docs: update governor docs to reflect reset-release-timer change * docs: fix spelling mistake in governor docs --------- Co-authored-by: Jason Matthyser <jason@asymmetric.re> * node: governor token list update * Node: p2p.Run interface change (wormhole-foundation#3996) * Node: p2p.Run interface change * Guardian should be able to set disableHeartbeatVerify * node: Fix issue where transfers that were loaded from the DB did not add a flow-cancel transfer on the TargetChain (wormhole-foundation#4002) node: Fix issue where transfers that were loaded from the DB did not add a flow-cancel transfer on the TargetChain Flow-canceling is done in the `ProcessMsgForTime` loop when a new message occurs. However, this was not done when a node restarted and reloaded transfers from the past 24 hours. As a result it was possible for the node to calculate a result that showed that the outgoing transfers for an emitter chain exceeded the daily limit. In effect this is true but only with the condition that there was incoming flow to allow this to happen. This appeared to violate an invariant and so the node did not start properly. Add unit tests when reloading flow cancel transactions from the database * Node/GWRelayer: Should check emitter for ibc transfers (wormhole-foundation#4007) * Node/EVM: Add more cases to canRetryGetBlockTime (wormhole-foundation#4010) * Node/EVM: Add more cases to canRetryGetBlockTime * Code review rework * node/hack: Update Sui RPC endpoint * Node/Acct: Remove obsolete pending transfers from db (wormhole-foundation#4003) * Deploy Berachain V2 testnet (wormhole-foundation#4020) * node: governor token list update * Update issue templates (wormhole-foundation#4024) * Update issue templates removed unused tracker, split bug reports into guardian vs non-guardian * Update bug_report_guardian.md smaller guardian bug report template * node: remove sui websocket from watcher * docs: Update CONTRIBUTING.md to forbid low-effort dependency changes * docs: Typo in CONTRIBUTING * sdk/js-query: add signaturesToSolanaArray util * chore(node/cmd): replace `fmt.Errorf` without parameters with `errors.New` (wormhole-foundation#4030) * node: devcontainer support * cosmwasm: add missing feature flags in Cargo.toml files These flags were previously not declared, which is now reported by clippy. Also fixed the docstring indentation, which clippy also complains about now. * Deploy Snaxchain to testnet (wormhole-foundation#4035) * Deploy Snaxchain to testnet * Code review rework * Input Validation on Generated Token List (wormhole-foundation#4026) * Add input validation on the generated go code * Change denylist and comment associated with it * Add changes to the text file and more restrictive chars * Removing unnecessary async code --------- Co-authored-by: Maxwell Dulin <strikeout@maxwells-mbp.lan> Co-authored-by: Maxwell Dulin <strikeout@Maxwells-MacBook-Pro.local> * Tilt: Various fixes * Tilt: Reenable query tests (wormhole-foundation#4040) * node/evm: retry tx once before timing out * node: Fix lint violation in hack/repair_eth/repair_eth.go * acala: update testnet contracts and VAA due to network reset * acala/karura: update testnet environments * Node: Minor logging changes * node/cleanup: Add some documentation and an ignore-list to mainnet_chains_test.go (wormhole-foundation#4038) * node: Add test to ensure that governed chains also have governed assets * node: Update test that checks if governed chains have governed assets - Verify tokens against the Governor's mainnet chains rather than querying against the values configured in the SDK. (These should be the same but it avoid a package import and ties in more closely to the Governor's functionality.) - Add test message on failure to help with debugging - Remove outdated exceptions to existing the unit test - Merge test from previous commit into a similar unit test that already existed * node: Revert chainList() so that its scope is private * node: Flow cancel enhancements and bug fixes (wormhole-foundation#4016) * node: Fix issue where transfers that were loaded from the DB did not add a flow-cancel transfer on the TargetChain Flow-canceling is done in the `ProcessMsgForTime` loop when a new message occurs. However, this was not done when a node restarted and reloaded transfers from the past 24 hours. As a result it was possible for the node to calculate a result that showed that the outgoing transfers for an emitter chain exceeded the daily limit. In effect this is true but only with the condition that there was incoming flow to allow this to happen. This appeared to violate an invariant and so the node did not start properly. node: Add unit tests when reloading flow cancel transactions from the database node: fix lint errors in governor_test.go * node: Add a command-line flag to enable or disable flow-canceling on restart Added a command-line flag to enable or disable flow-canceling when starting the node. This should allow Guardians to disable flow canceling in the case of future bugs or during a security incident. This should prevent the need to rollback to earlier Guardian versions. (@mdulin2 ) * node: Use deterministic iteration order over chains when changing Governor state - Adds a field that stores a sorted slice of chain IDs to the governor. - Use this field to iterate in a determinstic order when performing actions that change the state of the Governor - This should help Guardians reach a more similar view of the Governor in scenarios where iteration order might impact whether a transfer is queued. (This is relevant especially in the case of Flow Canceling) - Cases where only a single VAA is being modified were not changed. Iteration order should not matter here and determinstic order may may worse for performance when searching for a particular element. * node: Fix tokenEntry when checking flow cancel for pending transfers (Squash and merge bug fix from PR wormhole-foundation#4001) Similar to a previous issue in the function `ProcessMsgForTime`, the tokenEntry was not being generated properly. This should result in queued "small transfers" being able to flow cancel when they are released from the queue. Also adds a comment on the CheckedInt64 function to indicate what its error states mean and when they occur. Add comments and change variable names for governor_monitoring - Add function comments to explain what they do and what their error states mean - Adds governor logging to error cases - Change variable names in publishStatus function. `value` was used first to indicate the "governor usage" and then reused to indicate the remaining available notional value for a chain. This refactor tries to make it clear that these are different concepts Add unit test for flow cancelling when a pending transfer is released - Add a unit test to ensure that, when a pending transfer is released, it also does flow-cancelling on the TargetChain (previously we had a bug here) - Add documentation for CheckPendingForTime to clarify that it has side-effects * node: Modify error handling for CheckPending method in the Governor Previous rollouts of the Flow Cancel feature contained issues when calculating the Governor usage when usage was near the daily limit. This caused an invariant to be violated. However, this was propagated to the processor code and resulted in the processor restarting the entire process. Instead, the Governor should simply fail-closed and report that there is no remaining capacity, causing further VAAs to be queued until the usage diminishes over time. The circumstances leading to the invariant violations are not addressed in this commit. Instead this commit reworks the way errors are handled by the CheckPending, making careful choices about when the process should or should not be killed. - Change "invariant" error handling: instead of causing the process to die, log an error and skip further for a single chain while allowing processing for other chains to continue - Remove 'invariant error' in TrimAndSumValueForChain as it can occur somewhat regularly with the addition of the flow cancel feature - Return dailyLimit in error condition rather than 0 so that future transfers will be queued - Do not cap the sum returned from TrimAndSumValueForChain: instead allow it to exceed the daily limit. - Modify unit tests to reflect this - Add unit tests for overflow/underflow scenarios in the TrimAndSumValue functions - Change other less severe error cases to log warnings instead of returning errors. - Generally prevent flow-cancel related issues from affecting normal Governor operations. Instead the flow cancel transfers should simply not be populated and thus result in "GovernorV1" behavior. - Add documentation to CheckPendingForTime to explain the dangers of returning an error - Reword error messages to be more precise and include more relevant fields. Add documentation explaining when the process should and should not die * node: Add additional metrics for Governor status Modify the monitoring code and protobuf files to make the status of the Governor more legible when flow-canceling is enabled. This can be consumed by Wormhole Dashboard to better reflect the effects of flow cancelling. On the level of the Governor: - whether the Guardian has enabled flow cancel or not On the level of the Governor's emitters, reports 24h metrics for: - net value that has moved across the chain - total outgoing amount - total incoming flow cancel amount Currently big transfers are not accounted for as they do not affect the Governor's capacity. (They are always queued.) * node: Add new flow cancel parameter to Governor in tests * node: goimports formatting * node: Bug fix in changes to governor monitoring - Fix issue where stats weren't being populated unless flow cancel was enabled - Fix wrong return value used in unit test - Fix typo in proto variable name - Move sorting outside of a for loop for efficiency - Restore unit test that was deleted in the process of rebasing * node: address prealloc lint error in governor code * node: Fix "generated proto differs from committed proto" * node: Fix bug in chainIds allocation - This resolves a mistake with allocating the chainIds in the governor initialization that causes nil entries in the slice. - Add unit tests to ensure that the chainIds slice matches the chains map - Add unit test to ensure that TrimAndSumValueForChain checks for a nil pointer to avoid panics * node: Fix returning nil on err in governor_test.go * node: Cleanup comments in governor code * node: fix governor comment * node: enable flow cancel in governor_monitoring tests * node: Add flow cancel information to p2p heartbeat features * node: Remove outdated comment from governor * node: Upgrade logs to Error from Warn when reloading transfers from database * node: Enable flow cancel in check_query test function * node: Cleanup comments and redundant code in governor * node: Refactor how the flow cancel token list gets populated - Only populate the flow cancel tokens list once - Change default behavior to use an empty flow cancel assets list, rather than first populating the list and then clearing it - Refactor the logic around enabling the flow cancel token field for governed assets. Now it only executes if flow cancel is enabled, rather than operating over an empty slice when flow cancel is disabled - Modify devnet/testnet configs so that they are responsible for returning the correct list of flow cancelling assets * node: Add unit test for flow cancel feature flag * node: Move new Governor status proto fields from Emitter to Chain * node: lint governor_monitoring --------- Co-authored-by: Maxwell Dulin <strikeout@maxwells-mbp.lan> * Tilt: More multiple guardian changes (wormhole-foundation#4043) * wormchain: change ioutil to io and os (wormhole-foundation#3970) * ioutil to io and os * io to os * wormchain: refresh interchaintest (wormhole-foundation#3991) * Add interchaintest tests to wormchain * chore: refreshed icts * chore: updated wasm binaries for ict --------- Co-authored-by: Steve Miskovetz <steve.miskovetz@gmail.com> * node: Update token entries for devnet and testnet configs (wormhole-foundation#4045) * node: Update token entries for devnet and testnet configs The previous entries had incorrect CoinGecko symbol information. This commit updates the values in these config files to use the same data as their counterparts in the generated token list for mainnet. * node: revert devnet/testnet token address changes * ci: fix node tests fail intermittently (wormhole-foundation#4044) * ci: reveal sdk/js-query test errors * clients/js: move towards sdkV2 * CI: query-sdk-ci fails intermittently (wormhole-foundation#4046) * CI: query-sdk-ci fails intermittently * Make query-ci wait for more peers * Add new key for accountant on guardian-0 * Force to run on instance-2 * Undo instance change * Change accountant key for guardian 1 to ensure it's unique * Restore log levels * Fix solana query tests * Enable logging * Remove now-redundant query-ci tests * Node: Gateway relayer can ignore already attested return (wormhole-foundation#4052) * node: Fix missing tokens in generated token list due to outdated SDK (wormhole-foundation#4053) - The script used to generate the updated token list was using an outdated version of the TS SDK. This caused newer chains to return a result of 'undefined' when they were converted to the custom ChainId type as they were not encluded in the enum of valid chain IDs - Update the local SDK version used by the script via `npm update @certusone/wormhole-sdk` - Adds additional logging to the script to help troubleshoot similar errors in the future * sdk: update karura testnet contracts and tokenBridgeVAA (wormhole-foundation#4055) * env: add slow option to FORGE_ARGS on karura testnet * vaa: update karura testnet tokenBridgeVAA * karura: update karura testnet contracts * Terra2: Use "docker compose", not docker-compose (wormhole-foundation#4056) * ci: bump foundry to 2024-08-04 nightly * Node: Gossip Topic Split (wormhole-foundation#4000) * WIP: topic split * Add cutover support * Remove measurements that were moved to PR#3988 * Code review rework * Code review rework --------- Co-authored-by: Evan Gray <battledingo@gmail.com> * Node: Processor performance improvements (wormhole-foundation#3988) * Node: Processor db write separation * Handle additional update while writing to db * Move the broadcasting of signed VAA to the worker * Tweak signaturesToVaaFormat * Eliminate map look up in HandleQuorum * Remove unnecessary check for already submitted * Use BadgerDB batch API to store VAAs * Don't move broadcasting to worker * Speed up processing our own observation * Simplify handleMessage and broadastSignature * Code review rework * Node/P2P: Allow disabling subscribing to VAAs (wormhole-foundation#4019) * sdk: add Mantle and X Layer relayer addresses * node: Add Makefile test target for arm64 (wormhole-foundation#4057) * node: Add Makefile test target for arm64 * node: Fix comment in Makefile * Node/CCQ: Allow address wildcard (wormhole-foundation#4062) * Node/CCQ: Allow address wildcard * Code review rework * Update the docs * Node: Log fatal not making it to grafana (wormhole-foundation#4063) * Node/CCQ: Proxy log stats on timeout (wormhole-foundation#4067) * client/js: update Acala, add Snaxchain support * client/js: update wormhole-foundation/sdk (0.9.0) and certusone/wormhole-sdk (0.10.17) * client/js: update acala testnet chain id (597) * clients/js: add Snaxchain --------- Co-authored-by: Paul Noel <panoel007@gmail.com> * node: governor token list update * Node/CCQ: Solana should retry block read (wormhole-foundation#4068) * Node/CCQ: Solana should retry block read * Code review rework * Node/CCQ: Add command to verify the permissions file (wormhole-foundation#4071) * Node: Observation batching with override (wormhole-foundation#4066) * Node: Observation batching with override * Add cutover time * Code review rework * node: Update governor manual tokens list * Node: Unknown guardian tweaks (wormhole-foundation#4076) * Node: Unknown guardian tweaks * Code review rework * node/test: fix incorrect control flow in governed assets test The control flow for this function should be `continue` rather than `return`, otherwise the entire test will exit early when evaluating an ignored chain rather than skip the current iteration as intended. In practice, this test just so happened to work OK because: - the only ignored chain is Wormchain with ID 3104 - 3104 is the last entry in the slice - iteration over slices is deterministic And so the test always returned early when hitting Wormchain, but Wormchain is the last item in the list. * Github: Update Governor code owners * github: fix codeowners errors * Deploy Snaxchain mainnet * Fix governor test for Snax * Code review rework * Eth: Make registerAllChainsOnTokenBridge not require worm support (wormhole-foundation#4083) * Node/CCQ: Add rate limiting to proxy (wormhole-foundation#4080) * Node/CCQ: Add rate limiting * Code review rework * Node/CCQ: Make burst size default to one not zero * Tweak description of burst size in doc * node: Governor token list update (wormhole-foundation#4077) * node: governor token list update * Manually fix Moonbeam USDC price --------- Co-authored-by: djb15 <djb15@users.noreply.github.com> * Node/P2P: Add WithComponents (wormhole-foundation#4089) * ethereum: migrate truffle tests to forge * ethereum: rm truffle * ethereum/forge-test: clean up warnings * docs: instructions for deploying a wormchain validator added. * Node: Add cutover feature flags to heartbeats (wormhole-foundation#4092) --------- Co-authored-by: bruce-riley <96066700+bruce-riley@users.noreply.github.com> Co-authored-by: Evan Gray <battledingo@gmail.com> Co-authored-by: Dirk Brink <hello@dirk.tech> Co-authored-by: André Claro <andre.claro@gmail.com> Co-authored-by: John Saigle <4022790+johnsaigle@users.noreply.github.com> Co-authored-by: Jason Matthyser <jasonmatthyser@gmail.com> Co-authored-by: Jason Matthyser <jason@asymmetric.re> Co-authored-by: Nikhil Suri <nikhilsuri@comcast.net> Co-authored-by: djb15 <djb15@users.noreply.github.com> Co-authored-by: Adam <20446095+aadam-10@users.noreply.github.com> Co-authored-by: Paul Noel <panoel007@gmail.com> Co-authored-by: John Saigle <john@asymmetric.re> Co-authored-by: yukionfire <yukionfire@qq.com> Co-authored-by: Csongor Kiss <kiss.csongor.kiss@gmail.com> Co-authored-by: Maxwell "ꓘ" Dulin <mdulin2@zagmail.gonzaga.edu> Co-authored-by: Maxwell Dulin <strikeout@maxwells-mbp.lan> Co-authored-by: Maxwell Dulin <strikeout@Maxwells-MacBook-Pro.local> Co-authored-by: Bruce Riley <bruce@wormholelabs.xyz> Co-authored-by: Charlton Liv <170700811+charltonliv@users.noreply.github.com> Co-authored-by: Kaku <105181329+kakucodes@users.noreply.github.com> Co-authored-by: Steve Miskovetz <steve.miskovetz@gmail.com> Co-authored-by: derpy-duck <115193320+derpy-duck@users.noreply.github.com> Co-authored-by: Edgardo Quiroga <edgardo@xlabs.xyz>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There is a race condition in the node tests where the logger can get destroyed before the supervisor finishes logging on exit. This PR makes the tests in question wait for the supervisor to exit before cleaning up the logger.
Also, the tests in question were doing several defers that were not getting executed as expected because they were in a for loop, not a function. This PR changes the code so that the defers get executed between each test.